Stokes (2016) uses a natural experiment to demonstrate how citizens punish incumbent governments for progressive climate policy, leading to electoral losses for the incumbent party ranging from 4 to 10%, with the effect persisting 3 km from wind turbines.
We will first create fake data to simulate the operation of an instrumental variable. Using dagitty.net, create a DAG with an treatment, \(T\) that links to an outcome, \(Y\). Add a confounder, \(U\) with causal links to both the treatment and the outcome. Finally, add a binary instrument, \(Z\in\{0,1\}\), which links to the treatment alone.
Now create fake data, simulating the data generating process represented by your DAG. You will need to specify your model. For example, consider the following model (you may choose other, perhaps more complex models):
\[ \begin{aligned} Z &\sim \begin{cases} p=\frac{1}{2} & \text{if $z = 1$}, \\ p=\frac{1}{2} & \text{if $z = 0$} \end{cases} \\ U &\sim \mathcal{N}(\mu=0, \sigma=1) \\ T &\sim \mathcal{N}(\mu=\alpha_1\cdot Z +\alpha_2\cdot U, \sigma=1) \\ Y &\sim \mathcal{N}(\mu=\beta_1\cdot T+\beta_2\cdot U, \sigma=1) \\ \end{aligned} \] 1. Now let us explore our data as follows:
ivreg function from the AER library. Note that ivreg formula syntax is Y ~ T | Z, where T is treatment, and Z is the instrument.We will now replicate an article in which Leah Stokes (2016) examines whether governments are punished electorally for building wind farms, a policy that mitigates climate change but may impose costs on the communities where turbines are sited. She looks at Ontario in Canada, where from 2009 the provincial government removed local communities’ rights to make planning decisions on the building of wind turbines. Instead, decision-making was centralised and turbines were imposed by the government. It chose to build turbines in places where their construction was most feasible and they would generate the most electricity. In particular, they were more likely to be sited in places with higher prevailing wind speeds. Whilst certain broad areas are in general better suited for turbines (more rural and more elevated places, and areas closer to the windy great lakes), she argues that within these broad areas wind speed varies at random at the local level. This means that local communities could not select out of (or into) receiving a wind farm based on their levels of support for the policy or for the government. This is therefore a natural experiment where wind speed is an instrument that randomly encouraged the government to site turbines in particular places.
Her outcome of interest is change in support for the incumbent government from 2007 (before the wind farm policy) to 2011 (after it began) at a highly localised level known as “precincts” in Canada, which typically contain around 300 voters. Using GIS software, she geo-located all wind turbines that were built or proposed in the period and matched them to precincts, where she collected voting data, localised prevailing wind speeds, and background covariates. The dataset for this question can be downloaded here. It contains the following variables:
| Variable | Description |
|---|---|
| chng_lib | outcome: pp change in support for the incumbent government, 2007-11 |
| prop_3km | treatment: =1 if a wind turbine was built or proposed within 3km, 0 otherwise |
| avg_pwr_log | instrument: prevailing wind speed in the precinct, logged |
| longitude | of the precinct |
| latitude | of the precinct |
| ed_id | the broader district within which the precinct is located |
| mindistlake | distance to the great lakes in km |
| mindistlake sq | distance to the great lakes in km, squared |
Using data visualization, investigate the distribution of variables of interest, and the relationship between them. Create a DAG as a theoretical model, representing the system, and include it in your report. Does the data suggest that the instrument is relevant and valid? Justify your claim.
Assess whether wind speed can be considered to be as-if randomly assigned geographically, by regressing the instrument on all of the geographical covariates. What do you conclude? Is that problematic in terms of relevance or validity of the instrument?
Code Hint: Remember to use factor() for the ed_id variable
Estimate the first-stage relationship between treatment and the instrument using a regression with no added covariates. Interpret the result precisely, and comment on whether you think it might be biased and if so, why.
Stokes actually estimates the first and second stages with a full set of geographic controls included. Why do you think she does this?
Estimate the first-stage relationship between the treatment and the instrument using a regression, this time with a full set of geographic controls. Interpret the result, and compare to the estimates in the previous two questions question 4 (i.e., to Stokes’ estimates).
Estimate the (Local) Average Treatment Effect of the treatment on the outcome using two-stage least squares (2SLS) with avg_pwr_log as the instrument and the full set of geographic controls. Interpret the coeffcient on the treatment and its statistical significance precisely.
Code Hints: Remember to use ivreg() in the AER library. Your code should take the form:
ivreg(outcome ~ treatment + covariates | instrument + covariates)